Drop AutoTuner recommendation for concurrentGpuTasks for apps using plugin >= 25.06#2090
Drop AutoTuner recommendation for concurrentGpuTasks for apps using plugin >= 25.06#2090parthosa wants to merge 2 commits intoNVIDIA:devfrom
Conversation
Starting with plugin 25.06, the RAPIDS plugin auto-tunes the number of concurrent GPU tasks based on memory usage (NVIDIA/spark-rapids#12374), so the AutoTuner should stop recommending `spark.rapids.sql.concurrentGpuTasks` for apps using that plugin version or later. Target cluster `enforced` and `preserve` overrides still take precedence. Fixes NVIDIA#2089 Signed-off-by: Partho Sarthi <psarthi@nvidia.com>
03ebd43 to
1286dcc
Compare
Greptile SummaryThis PR drops the AutoTuner's Confidence Score: 5/5Safe to merge — logic is correct, all callers of the updated API are accounted for, and all edge cases are covered by tests. No P0 or P1 issues found. The compareVersions API change is a strict improvement and all callers are updated. The drop logic integrates cleanly with skippedRecommendations, ignoreRecommendation, and the initRecommendations / calculateClusterLevelRecommendations ordering. Five new tests provide thorough coverage. No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[calculateClusterLevelRecommendations] --> B{isPropertyUserOverridden\nconcurrentGpuTasks?}
B -- enforced --> C[appendRecommendation\nenforced value via initRecommendations]
B -- preserved --> D[appendRecommendation\ncalculated value]
B -- neither --> E{isConcurrentGpuTasksAutoTunedByPlugin?}
E --> F[getRapidsPluginJarVersion\nfromRapidsJars + pluginJarRegEx]
F --> G{distinct versions found?}
G -- exactly one --> H[compareVersions\njarVer vs 25.06.0]
G -- zero or 2+ --> I[None → false]
H --> J{>= 0?}
J -- yes --> K[skippedRecommendations +=\nconcurrentGpuTasks\ndrops rec + missing comment]
J -- no or None --> L[appendRecommendation\ncalcGpuConcTasks]
I --> L
Reviews (2): Last reviewed commit: "Address review: option-typed compareVers..." | Re-trigger Greptile |
- Make `ToolUtils.compareVersions` return `Option[Int]` so a parse failure cannot masquerade as version equality. Update the two existing callers to handle `None` explicitly. - Add a `ProfilingAutoTunerSuite` test covering the `preserve` branch of `Platform.isPropertyUserOverridden` for the concurrentGpuTasks drop logic. Signed-off-by: Partho Sarthi <psarthi@nvidia.com>
Fixes #2089
Changes
Starting with plugin 25.06, the RAPIDS plugin auto-tunes the number of concurrent GPU tasks based on memory usage (NVIDIA/spark-rapids#12374). The AutoTuner should no longer recommend
spark.rapids.sql.concurrentGpuTasksfor apps using that plugin version or later.Logic (in
AutoTuner.calculateClusterLevelRecommendations):appInfoProvider.getRapidsJarsusing the existingpluginJarRegEx.>= 25.06.0, skip the recommendation by adding the key toskippedRecommendations. This also suppresses the "was not set" missing comment.enforcedandpreserveoverrides take precedence — recommendation is still emitted in those cases.New helper:
Platform.isPropertyUserOverridden(key)— returns true if the property is inenforcedorpreserve. Lives next togetUserEnforcedSparkProperty/isPropertyPreserved/isPropertyExcluded.Cleanup: dropped an unnecessary
.toIntoncalcGpuConcTasks()—appendRecommendationalready has aLongoverload.Testing
ProfilingAutoTunerSuite— 4 new tests:spark.rapids.sql.concurrentGpuTasksfor plugin 25.06.0enforcedvalue wins over the drop logicFull core test suite (669 tests across 29 suites) passes.